Skip to content

Conversation

Kubuxu
Copy link
Collaborator

@Kubuxu Kubuxu commented Oct 9, 2025

No description provided.

@Kubuxu Kubuxu requested a review from a team as a code owner October 9, 2025 15:32
@Kubuxu Kubuxu force-pushed the pdp/create-and-upload branch from 64afc4a to 726cf4d Compare October 9, 2025 15:53
return fmt.Errorf("expeted to find dataSetId in receipt but failed to extract: %w", err)
}
// XXX: I considered here chekcing if dataset exists already in DB, but not sure if it is needed
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The majority of my questions are around this file. I'm yet to test it, but I'm using dataset=0 as a sentinel value.

It might be better if it were maybe NULL. Also, I don't know about any possible table relations that this might affect.
@LexLuthr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should look at handleGetPieceAdditionStatus, that's the only place I can see where it might matter - client can ask for piece addition status and it selects by data set id; what should the client ask for in the case of the combined flow and what do we expect them to get, because I think maybe none of it works currently. We call that with getPieceAdditionStatus in the SDK.

The only other place where we have functionality outside of this PR that impacts that table is the trigger we add for transaction resolving and it doesn't care about data set id:

CREATE OR REPLACE FUNCTION update_pdp_data_set_piece_adds()
RETURNS TRIGGER AS $$
BEGIN
IF OLD.tx_status = 'pending' AND (NEW.tx_status = 'confirmed' OR NEW.tx_status = 'failed') THEN
-- Update the add_message_ok field in pdp_data_set_piece_adds if a matching entry exists
UPDATE pdp_data_set_piece_adds
SET add_message_ok = CASE
WHEN NEW.tx_status = 'failed' OR NEW.tx_success = FALSE THEN FALSE
WHEN NEW.tx_status = 'confirmed' AND NEW.tx_success = TRUE THEN TRUE
ELSE add_message_ok
END
WHERE add_message_hash = NEW.signed_tx_hash;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could special-case dataset=0 in handleGetPieceAdditionStatus

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the client should ask for the dataset creation status instead. After that, and after the piece gets processed, the get status will work because the pdp_data_set_piece_adds.data_set will get updated to the proper dataset.

@Kubuxu Kubuxu changed the title feat(pdp): support create and upload [WIP] feat(pdp): support create and upload Oct 9, 2025
@Kubuxu
Copy link
Collaborator Author

Kubuxu commented Oct 10, 2025

@rvagg would appriciate you taking a look as well.


type RequestBody struct {
RecordKeeper string `json:"recordKeeper"`
Pieces []AddPieceRequest `json:"pieces"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, nicely typed

@rvagg
Copy link
Member

rvagg commented Oct 10, 2025

Looks good, my only concern is the impact on handleGetPieceAdditionStatus

@rjan90 rjan90 linked an issue Oct 13, 2025 that may be closed by this pull request
@Kubuxu Kubuxu force-pushed the pdp/create-and-upload branch 3 times, most recently from 4ac5aa6 to 3224e22 Compare October 14, 2025 13:58
@Kubuxu
Copy link
Collaborator Author

Kubuxu commented Oct 14, 2025

Looking at schema of pdp_data_set_piece_adds. The insert will fail due to foreign key pdp_proofset_root_adds_proofset_fkey as the pdp_data_sets.id won't exist at the time.

[email protected]:yugabyte> \d pdp_data_set_piece_adds
+-------------------+---------+-------------------------+
| Column            | Type    | Modifiers               |
|-------------------+---------+-------------------------|
| data_set          | bigint  |  not null               |
| piece             | text    |  not null               |
| add_message_hash  | text    |  not null               |
| add_message_ok    | boolean |                         |
| add_message_index | bigint  |  not null               |
| sub_piece         | text    |  not null               |
| sub_piece_offset  | bigint  |  not null               |
| sub_piece_size    | bigint  |  not null               |
| pdp_pieceref      | bigint  |  not null               |
| pieces_added      | boolean |  not null default false |
+-------------------+---------+-------------------------+
Indexes:
    "pdp_data_set_piece_adds_pk" PRIMARY KEY, lsm (data_set HASH, add_message_hash ASC, add_message_index ASC)
    "idx_pdp_data_set_piece_adds_pieces_added" lsm (pieces_added HASH)
Foreign-key constraints:
    "pdp_proofset_root_adds_add_message_hash_fkey" FOREIGN KEY (add_message_hash) REFERENCES message_waits_eth(signed_tx_hash) ON DELETE CASCADE
    "pdp_proofset_root_adds_pdp_pieceref_fkey" FOREIGN KEY (pdp_pieceref) REFERENCES pdp_piecerefs(id) ON DELETE SET NULL
    "pdp_proofset_root_adds_proofset_fkey" FOREIGN KEY (data_set) REFERENCES pdp_data_sets(id) ON DELETE CASCADE

I welcome suggestions on how to solve it cleanly, but it seems like transitioning to dataset being NULLable is the cleanest.

@Kubuxu
Copy link
Collaborator Author

Kubuxu commented Oct 14, 2025

I pushed a commit with a nullable dataset id in pdp_data_set_piece_adds, otherwise, the foreign key constraint would get in the way.

@Kubuxu
Copy link
Collaborator Author

Kubuxu commented Oct 14, 2025

Ohh, data_set cannot be nullable because it is a part of the primary key
Should we just remove it from the PK? I don't think we support flows where a single message adds to multiple different datasets, so that uniqueness constraint is not needed.

@Kubuxu Kubuxu requested a review from rvagg October 15, 2025 10:45
Copy link
Contributor

@LexLuthr LexLuthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the table, I don't see any way to get around the problem while keeping the primary key.

Comment on lines 114 to 123
resolvedDataSetId := pieceAdd.DataSet
if !resolvedDataSetId.Valid {
var err error
resolvedDataSetId.Int64, err = extractDataSetIdFromReceipt(receipt)
if err != nil {
return fmt.Errorf("expeted to find dataSetId in receipt but failed to extract: %w", err)
}
resolvedDataSetId.Valid = true
var exists bool
// we check if the dataset exists already to avoid foreign key violation
err = db.QueryRow(ctx, `
SELECT EXISTS (
SELECT 1
FROM pdp_data_sets
WHERE id = $1
)`, resolvedDataSetId.Int64).Scan(&exists)
if err != nil {
return fmt.Errorf("failed to check if data set exists: %w", err)
}
if !exists {
// XXX: maybe return nil instead to avoid warning?
return fmt.Errorf("data set %d not found in pdp_data_sets", resolvedDataSetId.Int64)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not try to create_watch here as well. Just scan from DB and if it is NULL then just skip processing addPiece for this tipset. But that might cause unnecessary delay of 1 tipset. Any other ideas do this this without 2 watches?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just scan from DB and if it is NULL then just skip processing addPiece for this tipset

That is what I'm doing here. If we get an add_piece triggered, and the dataset doesn't exist yet, we leave it alone and wait for another trigger for when the dataset is created.
A cleaner way might be to get create_watch to be processed with a higher priority than add_pieces. This could be achieved by combining them into one watcher and executing sequentially with create first.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After processing it sequentially through the combined watcher, I think we could obtain the dataset ID from the pdp_data_sets based on the create_message_hash.

Signed-off-by: Jakub Sztandera <[email protected]>
Signed-off-by: Jakub Sztandera <[email protected]>
Signed-off-by: Jakub Sztandera <[email protected]>
@Kubuxu Kubuxu force-pushed the pdp/create-and-upload branch from a17f27f to 44a4e47 Compare October 15, 2025 15:28
@Kubuxu Kubuxu force-pushed the pdp/create-and-upload branch from 44a4e47 to c18dbf2 Compare October 15, 2025 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PDPv0: uploadAndCreate

3 participants